Skip to content

Conversation

@sumedhsakdeo
Copy link

What changes are proposed in this pull request, and why are they necessary?

This PR introduces a comprehensive integration testing infra for the Coral project that validates SQL translation capabilities across multiple query engines (Spark and Trino) with support for both Iceberg tables, and Hive tables / views.

It also adds sample test that can serve as a starter reference to add more tests to the repo and overall making the coral's interoperability story even more robust..

How was this patch tested?

The patch only adds tests to the coral repo.

@sumedhsakdeo sumedhsakdeo requested a review from aastha25 October 22, 2025 22:48
@wmoustafa
Copy link
Contributor

Thanks Sumedh for the PR! Both coral-trino and coral-spark contain integration tests of this nature and also structure how to provide the input and assert output in a consistent way. Could you reuse them to add the specific tests or cases you want to add?

@sumedhsakdeo
Copy link
Author

Thanks Sumedh for the PR! Both coral-trino and coral-spark contain integration tests of this nature and also structure how to provide the input and assert output in a consistent way. Could you reuse them to add the specific tests or cases you want to add?

Thanks Walaa. I did look at the tests in coral-trino and coral-spark, however, I was trying to create a test module where interoperability can be tested thoroughly. Hence the need for separate module. Also, the way things are setup, we need more flexibility in how we instantiate HMS for Hive (built-in is ok) v/s Iceberg (built-in is not ok).

I think there is value in this kind of "uber" integration tests for interop testing, hence the addition.

@wmoustafa
Copy link
Contributor

Can we unify things somehow? It is not clear to me when to use either side. Also there is a bunch of repeated code.

@sumedhsakdeo
Copy link
Author

Can we unify things somehow? It is not clear to me when to use either side. Also there is a bunch of repeated code.

Ack, I understand the concern about repeated code related to infra setup (Spark, HMS, Iceberg, etc.) and checking why standalone module for interoperability testing is key for a project like Coral. Here are some thoughts.

We could position the tests as follows:
coral-trino, coral-spark: Unit/module-level tests validating translation logic in isolation
coral-integration-test: End-to-end tests validating that translated queries actually execute correctly across engines and table formats. This provides a place to test scenarios that span multiple engines and formats, instead of duplicating them in each of the modules.

It already has started to be effective for some of the test scenarios, https://github.com/sumedhsakdeo/coral/pull/1/files
Main thing I would like to avoid with this module is developers preparing SNAPSHOT jars and then running in spark-shell / coral-tools manually, while be able to put a debugger in the IntelliJ and learn what is going on.

My worry is if I do bring setup module into coral-trino and coral-spark, I will likely be replicating a lot more of boiler plate code for Iceberg, HMS setup in both the modules. Also, I would rather lean on Mocking in individual modules, and full blown integration tests in this newly introduced module.

Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants